Skip to content

BUG: Add min/max methods to ArrowExtensionArray GH#61311 #61924

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 5, 2025

Conversation

skonda29
Copy link
Contributor

@skonda29 skonda29 commented Jul 22, 2025

The core problem is that when using .iloc with PyArrow-backed DataFrames, pandas' indexing validation calls min() and max() methods on the ArrowExtensionArray for bounds checking, but these methods were not implemented, resulting in AttributeError: 'ArrowExtensionArray' object has no attribute 'max'. This breaks basic indexing functionality that works with regular pandas DataFrames, creating an inconsistency in the PyArrow backend experience.

Proposed Solution -
My proposed solution addresses the issue by modifying _validate_key in pandas/core/indexing.py to detect ExtensionArrays and convert them to numpy arrays using to_numpy() or np.asarray(). Included a test case in the file pandas/tests/indexing/test_iloc.py that reproduces the issue to verify the implementation.

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pull request, but the proper fix is for _validate_key to handle key: ExtensionArray correctly

@skonda29
Copy link
Contributor Author

@mroeschke Thank you for your suggestion. I will rework on this PR to implement the _validate_key fix instead.

@simonjayhawkins simonjayhawkins added Bug Indexing Related to indexing on series/frames, not to indexes themselves Arrow pyarrow functionality labels Jul 23, 2025
@skonda29 skonda29 force-pushed the skonda29-issue-61311 branch from 117175b to f87de21 Compare July 30, 2025 15:19
@skonda29
Copy link
Contributor Author

@mroeschke Would you mind taking a look at this PR when you get a chance? I've added a conversion to NumPy in _validate_key, and included a test case.

Feedback is appreciated!

Comment on lines 1613 to 1617
# convert to numpy array for min/max with ExtensionArrays
if hasattr(arr, "to_numpy"):
np_arr = arr.to_numpy()
else:
np_arr = np.asarray(arr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if arr is an ExtensionArray using if isinstance(arr.dtype, ExtensionDtype) and use arr._reduce("max") instead


df_arrow = df.convert_dtypes(dtype_backend="pyarrow")
result = df_arrow.iloc[:, df_arrow["c"]]
expected = df_arrow.iloc[:, [0, 2]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Construct this using DataFrame(...) instead

@skonda29
Copy link
Contributor Author

skonda29 commented Aug 4, 2025

@mroeschke Please take a look at this implementation. I've implemented your suggestions

@mroeschke mroeschke added this to the 3.0 milestone Aug 5, 2025
@mroeschke mroeschke merged commit 618de88 into pandas-dev:main Aug 5, 2025
40 of 43 checks passed
@mroeschke
Copy link
Member

Thanks @skonda29

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: 'ArrowExtensionArray' object has no attribute 'max' when passing pyarrow-backed series to .iloc
3 participants